Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
نویسندگان
چکیده
In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients. Dedicated to the memory of Maryam Mirzakhani.
منابع مشابه
On the Power of Over-parametrization in Neural Networks with Quadratic Activation
We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a k hidden node shallow network with quadratic activation and n training data points, we show as long as k ≥ √ 2n, over-parametrization enables local search algorithms to find a globally optimal solution for general smooth and convex loss functions. Further, despite that the number of p...
متن کاملSGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations. Nonetheless, current generalization bounds for neural networks fail to explain this phenomenon. In an attempt to bridge this gap, we study the problem of learning a two-layer over-parameterized neural network, when the data is generate...
متن کاملPROJECTED DYNAMICAL SYSTEMS AND OPTIMIZATION PROBLEMS
We establish a relationship between general constrained pseudoconvex optimization problems and globally projected dynamical systems. A corresponding novel neural network model, which is globally convergent and stable in the sense of Lyapunov, is proposed. Both theoretical and numerical approaches are considered. Numerical simulations for three constrained nonlinear optimization problems a...
متن کاملTopology and Geometry of Half-Rectified Network Optimization
The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model. In this work, we do not make any such assump...
متن کاملApplication of Wavelet Neural Networks for Improving of Ionospheric Tomography Reconstruction over Iran
In this paper, a new method of ionospheric tomography is developed and evaluated based on the neural networks (NN). This new method is named ITNN. In this method, wavelet neural network (WNN) with particle swarm optimization (PSO) training algorithm is used to solve some of the ionospheric tomography problems. The results of ITNN method are compared with the residual minimization training neura...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.04926 شماره
صفحات -
تاریخ انتشار 2017